26 research outputs found

    Kernel Graph Convolutional Neural Networks

    Full text link
    Graph kernels have been successfully applied to many graph classification problems. Typically, a kernel is first designed, and then an SVM classifier is trained based on the features defined implicitly by this kernel. This two-stage approach decouples data representation from learning, which is suboptimal. On the other hand, Convolutional Neural Networks (CNNs) have the capability to learn their own features directly from the raw data during training. Unfortunately, they cannot handle irregular data such as graphs. We address this challenge by using graph kernels to embed meaningful local neighborhoods of the graphs in a continuous vector space. A set of filters is then convolved with these patches, pooled, and the output is then passed to a feedforward network. With limited parameter tuning, our approach outperforms strong baselines on 7 out of 10 benchmark datasets.Comment: Accepted at ICANN '1

    Boosting Tricks for Word Mover's Distance

    Get PDF
    Due to the COVID-19 pandemic, the physical meeting of ICANN 2020 has been postponed. The event is scheduled next yearā€™s ICANN in September 2021 in Bratislava, Slovakia.International audienceWord embeddings have opened a new path in creating novel approaches for addressing traditional problems in the natural language processing (NLP) domain. However, using word embeddings to compare text documents remains a relatively unexplored topic-with Word Mover's Distance (WMD) being the prominent tool used so far. In this paper, we present a variety of tools that can further improve the computation of distances between documents based on WMD. We demonstrate that, alternative stopwords, cross document-topic comparison, deep contextualized word vectors and convex metric learning, constitute powerful tools that can boost WMD

    Nouvelles ReprƩsentations, la RƩgularisation et les Distances pour la Classification de Texte

    No full text
    Le texte a eĢteĢ le moyen dominant de stocker des donneĢes dans des systeĢ€mes infor- matiques et dā€™envoyer des informations sur le Web. Lā€™extraction de repreĢsentations significatives hors du texte a eĢteĢ un eĢleĢment cleĢ de la modeĢlisation de langage afin de traiter des taĢ‚ches de la NLP telles que la classification de texte. Ces repreĢsentations peuvent ensuite former des groupes que lā€™on peut utiliser pour des probleĢ€mes dā€™apprentissage superviseĢ. Plus speĢcifiquement, on peut utiliser ces groupes linguistiques aĢ€ des fins de reĢgularisation. Enfin, ces structures peuvent eĢ‚tre utiles dans un autre domaine important, le calcul de distance entre documents texte.Lā€™objectif principal de cette theĢ€se est dā€™eĢtudier les probleĢ€mes susmentionneĢs; Tout dā€™abord, en examinant de nouvelles repreĢsentations de texte baseĢes sur des graphes. Ensuite, nous avons eĢtudieĢ comment des groupes de ces repreĢsentations peuvent aider aĢ€ la reĢgularisation dans des modeĢ€les dā€™apprentissage automatique pour la classification de texte. Enfin, nous avons traiteĢ des ensembles et de la mesure des distances entre les documents, en utilisant les groupes linguistiques que nous avons proposeĢs, ainsi que des approches baseĢes sur des graphes.Dans la premieĢ€re partie de la theĢ€se, nous avons eĢtudieĢ les repreĢsentations de texte baseĢes sur des graphes. Transformer le texte en graphiques nā€™est pas anodin et existait avant meĢ‚me que les mots incorporeĢs ne soient introduits dans la communauteĢ NLP. Dans notre travail, nous montrons que les repreĢsentations graphiques de texte peuvent capturer efficacement des relations telles que lā€™ordre, la seĢmantique ou la structure syntaxique. De plus, ils peuvent eĢ‚tre creĢeĢs rapidement tout en offrant une grande polyvalence pour de multiples taĢ‚ches.Dans la deuxieĢ€me partie, nous nous sommes concentreĢs sur la reĢgularisation structureĢe du texte. Les donneĢes textuelles souffrent du probleĢ€me de dimensionnaliteĢ, creĢant de grands espaces de fonctionnaliteĢs. La reĢgularisation est essentielle pour tout modeĢ€le dā€™apprentissage automatique, car elle permet de remeĢdier au surajustement. Dans notre travail, nous preĢsentons de nouvelles approches pour la reĢgularisation de texte, en introduisant de nouveaux groupes de structures linguistiques et en concevant de nouveaux algorithmes.Dans la dernieĢ€re partie de la theĢ€se, nous eĢtudions de nouvelles meĢthodes pour mesurer la distance dans le mot englobant lā€™espace. PremieĢ€rement, nous preĢsentons diverses meĢthodes pour ameĢliorer la comparaison entre des documents constitueĢs de vecteurs de mots. Ensuite, en preĢsentant la comparaison des documents comme une correspondance bipartite pondeĢreĢe, nous montrons comment nous pouvons apprendre des repreĢsentations cacheĢes et ameĢliorer les reĢsultats pour la taĢ‚che de classification de texte.Enfin, nous conclurons en reĢsumant les principaux points de la contribution totale et en discutant des orientations futures..Text has been the dominant way of storing data in computer systems and sending information around the Web. Extracting meaningful representations out of text has been a key element for modelling language in order to tackle NLP tasks like text classification. These representations can then form groups that one can use for supervised learning problems. More specifically, one can utilize these linguistic groups for regularization purposes. Last, these structures can be of help in another important field, distance computation between text documents.The main goal of this thesis is to study the aforementioned problems; first, by examining new graph-based representations of text. Next, we studied how groups of these representations can help regularization in machine learning mod- els for text classification. Last, we dealt with sets and measuring distances between documents, utilizing our proposed linguistic groups, as well as graph-based ap- proaches.In the first part of the thesis, we have studied graph-based representations of text. Turning text to graphs is not trivial and has been around even before word embeddings were introduced to the NLP community. In our work, we show that graph-based representations of text can capture effectively relationships like order, semantic or syntactic structure. Moreover, they can be created fast while offering great versatility for multiple tasks.In the second part, we focused on structured regularization for text. Textual data suffer from the dimensionality problem, creating huge feature spaces. Regu- larization is critical for any machine learning model, as it can address overfitting. In our work we present novel approaches for text regularization, by introducing new groups of linguistic structures and designing new algorithms.In the last part of the thesis, we study new methods to measure distance in the word embedding space. First, we introduce diverse methods to boost comparison between documents that consist of word vectors. Next, representing the comparison of the documents as a weighted bipartite matching, we show how we can learn hidden representations and improve results for the text classification task.Finally, we conclude by summarizing the main points of the total contribution and discuss future directions

    Nouvelles ReprƩsentations, la RƩgularisation et les Distances pour la Classification de Texte

    No full text
    Text has been the dominant way of storing data in computer systems and sending information around the Web. Extracting meaningful representations out of text has been a key element for modelling language in order to tackle NLP tasks like text classification. These representations can then form groups that one can use for supervised learning problems. More specifically, one can utilize these linguistic groups for regularization purposes. Last, these structures can be of help in another important field, distance computation between text documents.The main goal of this thesis is to study the aforementioned problems; first, by examining new graph-based representations of text. Next, we studied how groups of these representations can help regularization in machine learning mod- els for text classification. Last, we dealt with sets and measuring distances between documents, utilizing our proposed linguistic groups, as well as graph-based ap- proaches.In the first part of the thesis, we have studied graph-based representations of text. Turning text to graphs is not trivial and has been around even before word embeddings were introduced to the NLP community. In our work, we show that graph-based representations of text can capture effectively relationships like order, semantic or syntactic structure. Moreover, they can be created fast while offering great versatility for multiple tasks.In the second part, we focused on structured regularization for text. Textual data suffer from the dimensionality problem, creating huge feature spaces. Regu- larization is critical for any machine learning model, as it can address overfitting. In our work we present novel approaches for text regularization, by introducing new groups of linguistic structures and designing new algorithms.In the last part of the thesis, we study new methods to measure distance in the word embedding space. First, we introduce diverse methods to boost comparison between documents that consist of word vectors. Next, representing the comparison of the documents as a weighted bipartite matching, we show how we can learn hidden representations and improve results for the text classification task.Finally, we conclude by summarizing the main points of the total contribution and discuss future directions.Le texte a eĢteĢ le moyen dominant de stocker des donneĢes dans des systeĢ€mes infor- matiques et dā€™envoyer des informations sur le Web. Lā€™extraction de repreĢsentations significatives hors du texte a eĢteĢ un eĢleĢment cleĢ de la modeĢlisation de langage afin de traiter des taĢ‚ches de la NLP telles que la classification de texte. Ces repreĢsentations peuvent ensuite former des groupes que lā€™on peut utiliser pour des probleĢ€mes dā€™apprentissage superviseĢ. Plus speĢcifiquement, on peut utiliser ces groupes linguistiques aĢ€ des fins de reĢgularisation. Enfin, ces structures peuvent eĢ‚tre utiles dans un autre domaine important, le calcul de distance entre documents texte.Lā€™objectif principal de cette theĢ€se est dā€™eĢtudier les probleĢ€mes susmentionneĢs; Tout dā€™abord, en examinant de nouvelles repreĢsentations de texte baseĢes sur des graphes. Ensuite, nous avons eĢtudieĢ comment des groupes de ces repreĢsentations peuvent aider aĢ€ la reĢgularisation dans des modeĢ€les dā€™apprentissage automatique pour la classification de texte. Enfin, nous avons traiteĢ des ensembles et de la mesure des distances entre les documents, en utilisant les groupes linguistiques que nous avons proposeĢs, ainsi que des approches baseĢes sur des graphes.Dans la premieĢ€re partie de la theĢ€se, nous avons eĢtudieĢ les repreĢsentations de texte baseĢes sur des graphes. Transformer le texte en graphiques nā€™est pas anodin et existait avant meĢ‚me que les mots incorporeĢs ne soient introduits dans la communauteĢ NLP. Dans notre travail, nous montrons que les repreĢsentations graphiques de texte peuvent capturer efficacement des relations telles que lā€™ordre, la seĢmantique ou la structure syntaxique. De plus, ils peuvent eĢ‚tre creĢeĢs rapidement tout en offrant une grande polyvalence pour de multiples taĢ‚ches.Dans la deuxieĢ€me partie, nous nous sommes concentreĢs sur la reĢgularisation structureĢe du texte. Les donneĢes textuelles souffrent du probleĢ€me de dimensionnaliteĢ, creĢant de grands espaces de fonctionnaliteĢs. La reĢgularisation est essentielle pour tout modeĢ€le dā€™apprentissage automatique, car elle permet de remeĢdier au surajustement. Dans notre travail, nous preĢsentons de nouvelles approches pour la reĢgularisation de texte, en introduisant de nouveaux groupes de structures linguistiques et en concevant de nouveaux algorithmes.Dans la dernieĢ€re partie de la theĢ€se, nous eĢtudions de nouvelles meĢthodes pour mesurer la distance dans le mot englobant lā€™espace. PremieĢ€rement, nous preĢsentons diverses meĢthodes pour ameĢliorer la comparaison entre des documents constitueĢs de vecteurs de mots. Ensuite, en preĢsentant la comparaison des documents comme une correspondance bipartite pondeĢreĢe, nous montrons comment nous pouvons apprendre des repreĢsentations cacheĢes et ameĢliorer les reĢsultats pour la taĢ‚che de classification de texte.Enfin, nous conclurons en reĢsumant les principaux points de la contribution totale et en discutant des orientations futures.

    Nouvelles ReprƩsentations, la RƩgularisation et les Distances pour la Classification de Texte

    No full text
    Text has been the dominant way of storing data in computer systems and sending information around the Web. Extracting meaningful representations out of text has been a key element for modelling language in order to tackle NLP tasks like text classification. These representations can then form groups that one can use for supervised learning problems. More specifically, one can utilize these linguistic groups for regularization purposes. Last, these structures can be of help in another important field, distance computation between text documents.The main goal of this thesis is to study the aforementioned problems; first, by examining new graph-based representations of text. Next, we studied how groups of these representations can help regularization in machine learning mod- els for text classification. Last, we dealt with sets and measuring distances between documents, utilizing our proposed linguistic groups, as well as graph-based ap- proaches.In the first part of the thesis, we have studied graph-based representations of text. Turning text to graphs is not trivial and has been around even before word embeddings were introduced to the NLP community. In our work, we show that graph-based representations of text can capture effectively relationships like order, semantic or syntactic structure. Moreover, they can be created fast while offering great versatility for multiple tasks.In the second part, we focused on structured regularization for text. Textual data suffer from the dimensionality problem, creating huge feature spaces. Regu- larization is critical for any machine learning model, as it can address overfitting. In our work we present novel approaches for text regularization, by introducing new groups of linguistic structures and designing new algorithms.In the last part of the thesis, we study new methods to measure distance in the word embedding space. First, we introduce diverse methods to boost comparison between documents that consist of word vectors. Next, representing the comparison of the documents as a weighted bipartite matching, we show how we can learn hidden representations and improve results for the text classification task.Finally, we conclude by summarizing the main points of the total contribution and discuss future directions.Le texte a eĢteĢ le moyen dominant de stocker des donneĢes dans des systeĢ€mes infor- matiques et dā€™envoyer des informations sur le Web. Lā€™extraction de repreĢsentations significatives hors du texte a eĢteĢ un eĢleĢment cleĢ de la modeĢlisation de langage afin de traiter des taĢ‚ches de la NLP telles que la classification de texte. Ces repreĢsentations peuvent ensuite former des groupes que lā€™on peut utiliser pour des probleĢ€mes dā€™apprentissage superviseĢ. Plus speĢcifiquement, on peut utiliser ces groupes linguistiques aĢ€ des fins de reĢgularisation. Enfin, ces structures peuvent eĢ‚tre utiles dans un autre domaine important, le calcul de distance entre documents texte.Lā€™objectif principal de cette theĢ€se est dā€™eĢtudier les probleĢ€mes susmentionneĢs; Tout dā€™abord, en examinant de nouvelles repreĢsentations de texte baseĢes sur des graphes. Ensuite, nous avons eĢtudieĢ comment des groupes de ces repreĢsentations peuvent aider aĢ€ la reĢgularisation dans des modeĢ€les dā€™apprentissage automatique pour la classification de texte. Enfin, nous avons traiteĢ des ensembles et de la mesure des distances entre les documents, en utilisant les groupes linguistiques que nous avons proposeĢs, ainsi que des approches baseĢes sur des graphes.Dans la premieĢ€re partie de la theĢ€se, nous avons eĢtudieĢ les repreĢsentations de texte baseĢes sur des graphes. Transformer le texte en graphiques nā€™est pas anodin et existait avant meĢ‚me que les mots incorporeĢs ne soient introduits dans la communauteĢ NLP. Dans notre travail, nous montrons que les repreĢsentations graphiques de texte peuvent capturer efficacement des relations telles que lā€™ordre, la seĢmantique ou la structure syntaxique. De plus, ils peuvent eĢ‚tre creĢeĢs rapidement tout en offrant une grande polyvalence pour de multiples taĢ‚ches.Dans la deuxieĢ€me partie, nous nous sommes concentreĢs sur la reĢgularisation structureĢe du texte. Les donneĢes textuelles souffrent du probleĢ€me de dimensionnaliteĢ, creĢant de grands espaces de fonctionnaliteĢs. La reĢgularisation est essentielle pour tout modeĢ€le dā€™apprentissage automatique, car elle permet de remeĢdier au surajustement. Dans notre travail, nous preĢsentons de nouvelles approches pour la reĢgularisation de texte, en introduisant de nouveaux groupes de structures linguistiques et en concevant de nouveaux algorithmes.Dans la dernieĢ€re partie de la theĢ€se, nous eĢtudions de nouvelles meĢthodes pour mesurer la distance dans le mot englobant lā€™espace. PremieĢ€rement, nous preĢsentons diverses meĢthodes pour ameĢliorer la comparaison entre des documents constitueĢs de vecteurs de mots. Ensuite, en preĢsentant la comparaison des documents comme une correspondance bipartite pondeĢreĢe, nous montrons comment nous pouvons apprendre des repreĢsentations cacheĢes et ameĢliorer les reĢsultats pour la taĢ‚che de classification de texte.Enfin, nous conclurons en reĢsumant les principaux points de la contribution totale et en discutant des orientations futures.

    Cyber Threats to Industrial IoT: A Survey on Attacks and Countermeasures

    No full text
    In todayā€™s Industrial Internet of Things (IIoT) environment, where different systems interact with the physical world, the state proposed by the Industry 4.0 standards can lead to escalating vulnerabilities, especially when these systems receive data streams from multiple intermediaries, requiring multilevel security approaches, in addition to link encryption. At the same time taking into account the heterogeneity of the systems included in the IIoT ecosystem and the non-institutionalized interoperability in terms of hardware and software, serious issues arise as to how to secure these systems. In this framework, given that the protection of industrial equipment is a requirement inextricably linked to technological developments and the use of the IoT, it is important to identify the major vulnerabilities and the associated risks and threats and to suggest the most appropriate countermeasures. In this context, this study provides a description of the attacks against IIoT systems, as well as a thorough analysis of the solutions for these attacks, as they have been proposed in the most recent literature
    corecore